WIP: Collisionless containers (DO NOT MERGE) #217

NorfairKing · 2018-10-15T05:55:10Z

No description provided.

treeowl · 2018-10-30T19:31:15Z

Data/HashMap/Base.hs

-      showString "fromList " . shows (toList m)
+-- instance (Show k, Show v) => Show (HashMap k v) where
+--     showsPrec d m = showParen (d > 10) $
+--       showString "fromList " . shows (toList m)


This instance must be restored!

vaibhavsagar · 2018-11-26T16:03:55Z

docs/developer-guide.md

+denial of service exploit. However, we can use the fact that `hashable` supports
+salted hashing by default to alleviate at least some of this problem.  In the
+current version, insertion time is not `O(number of collisions)` but `O(number
+of consequtive levels of collision)`.  While this is not better in case all the


s/consequtive/consecutive/

sjakobi · 2020-06-19T15:33:34Z

Hi @NorfairKing!

Since @emilypi and I have joined @treeowl and @tibbe as maintainers of this package, we've been discussing what to do about the HashDoS potential and what to make of this PR.

I currently think it's very intriguing, others are more sceptical.

So far the focus of this package and hashable has been very much on performance. hashable even explicitly recommends against using HashMaps for untrusted input:

http://hackage.haskell.org/package/hashable-1.3.0.0/docs/Data-Hashable.html#g:1

But as you are fully aware, some users still use u-c with untrusted data.

So if your approach can sufficiently reduce the HashDoS potential without reducing u-c's performance too much and without introducing new pitfalls or security holes, I believe it has a good chance of being merged. After all, a lot of users could benefit!

In any case, we have lots of questions. I'll start with my own:

Is this approach your own invention? Has it been tested in other hash map implementations or are there any (reviewed) publications about it?
Do you have a practical exploit that is fixed by this PR? If so, please share it with us maintainers privately.
How does this PR affect performance? We'll need benchmark results for this, both for "common" inputs, and inputs that could be used for a HashDoS. The most interesting operations to check are probably inserts and lookups. (For comparing the results, I recommend criterion-compare)
Is this PR "correct" in that it doesn't introduce any new bugs? Any tests that still need to be added?

sjakobi · 2020-06-19T15:35:48Z

@treeowl There are a lot of open "conversations" resulting from previous reviews of yours. Could you possibly hide those that have been resolved, by clicking on "Resolve conversation" for those?

NorfairKing · 2020-06-19T19:02:19Z

Is this approach your own invention? Has it been tested in other hash map implementations or are there any (reviewed) publications about it?

The approach is new-ish. See https://stackoverflow.com/questions/53495133/is-this-approach-to-dealing-with-hash-collisions-new-unique

Do you have a practical exploit that is fixed by this PR? If so, please share it with us maintainers privately.

Yes, I have >30K collisions in a ~1MiB json object that can keep a json-parsing server busy for minutes. I've already shared this with tibbe and treeowl.

How does this PR affect performance? We'll need benchmark results for this, both for "common" inputs, and inputs that could be used for a HashDoS. The most interesting operations to check are probably inserts and lookups. (For comparing the results, I recommend criterion-compare)

We did the benchmarks as well and saw no difference. That makes sense because this implementation only differs when there are (many) conflicts. If I remember correctly it's in the package that I sent to treeowl and tibbe.

Is this PR "correct" in that it doesn't introduce any new bugs? Any tests that still need to be added?

Well the pr needs to be rebased on top of something more recent, but all the tests still passed when I last did that.

sjakobi · 2020-06-20T12:24:57Z

Many thanks for the quick response, @NorfairKing! :)

For now, I'd just like to let you know that our internal discussion is progressing further, and I think we'll be able to give you more feedback within in the next few weeks.

NorfairKing · 2020-06-20T13:27:37Z

Many thanks for the quick response, @NorfairKing! :)

For now, I'd just like to let you know that our internal discussion is progressing further, and I think we'll be able to give you more feedback within in the next few weeks.

Thanks for picking this up. I 'm curious to hear more but I am no longer being payed to work on this so I will not be able to justify spending a lot of time on this.

sjakobi · 2020-06-20T13:43:34Z

Out of curiosity: Have you ever considered using hashmap instead, @NorfairKing? Since it uses Data.Map to store hash collisions, it should be more resistant against HashDoS IMU.

hashmap is currently deprecated of course, but it could be revamped.

NorfairKing · 2020-06-20T13:50:50Z

@sjakobi I haven't. We did this work because aeson uses unordered-containers instead of containers, which makes each haskell JSON server vulnerable. So unless you're able to convince the aeson maintainers to break their api and use another map type, it also doesn't matter to me.
The collisionless maintainers in this PR fix the problem without breaking API or sacrificing performance.

TomMD · 2021-09-11T18:54:33Z

@NorfairKing Is there a reason this is still WIP and DO NOT MERGE?

@sjakobi The "internal discussion" seems to have gotten lost in the shuffle a year back. A timely fix would be good at this point. Can we merge?

treeowl · 2021-09-11T18:56:34Z

There's still considerable debate about this. Johan Tibell has said no, and he knows more about this package than anyone.

…

On Sat, Sep 11, 2021, 2:54 PM Thomas M. DuBuisson ***@***.***> wrote: @NorfairKing <https://github.com/NorfairKing> Is there a reason this is still WIP and DO NOT MERGE? @sjakobi <https://github.com/sjakobi> The "internal discussion" seems to have gotten lost in the shuffle a year back. A timely fix would be good at this point. Can we merge? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#217 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOOF7LGIXFZQTFLRUPDKADUBOQXJANCNFSM4F3ORDLA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

TomMD · 2021-09-11T19:00:09Z

Johan Tibell has said no

Is there a link to that discussion so we can hear the objections first hand? As an open source project it would be traditional to have the objection available in the code review.

treeowl · 2021-09-11T19:01:51Z

I'm not sure. There was a lot of private discussion and a lot of things have fallen out of my cache. I'm sure he'd be happy to comment publicly.

…

On Sat, Sep 11, 2021, 3:00 PM Thomas M. DuBuisson ***@***.***> wrote: Johan Tibell has said no Is there a link to that discussion so we can hear the objections first hand? As an open source project it would be traditional to have the objection available in the code review. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#217 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOOF7KYTZ3UDNUZZH6XB43UBORMJANCNFSM4F3ORDLA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

NorfairKing · 2021-09-11T19:02:07Z

Is there a link to that discussion so we can hear the objections first hand? As an open source project it would be traditional to have the objection available in the code review.

This was a private conversation as part of the responsible disclosure procedure.

Boarders · 2021-09-11T22:59:34Z

@NorfairKing You suggested this doesn't change performance, where are the benchmarks to show that?

NorfairKing · 2021-09-12T08:42:53Z

@NorfairKing You suggested this doesn't change performance, where are the benchmarks to show that?

Yes, they were included in the original communication, but you can also just try to run the benchmarks locally to see.

Like I said:

We did the benchmarks as well and saw no difference. That makes sense because this implementation only differs when there > are (many) collisions. If I remember correctly it's in the package that I sent to treeowl and tibbe.

Bodigrim · 2021-09-12T09:32:42Z

@NorfairKing thanks for your work and investigation and this PR, nicely done!

@TomMD it seems a bit unfair to mount pressure on maintainers of u-c: they are responsible neither for the choice of a hash function upstream, nor for downstream packages, which disregarded security warnings in documentation. Performance of u-c is a difficult business, and validating such change to the underlying data structure is no small feat. Especially given that this is a novel approach.

I guess from pragmatic perspective it would be much simpler to replace Array by Map, but this imposes Ord. This is likely a non-issue for mainstream languages, so they never explored approaches avoiding Ord such as the proposed sequence of nested HashMaps, which I personally find very appealing and beautiful.

sjakobi · 2021-09-13T13:55:32Z

@NorfairKing Thanks for the publishing this great write-up, and apologies for responding so slowly to your PR!

I'm happy that, now that the vulnerability is properly published, we can discuss it more easily in public.

I have opened #319, so we can discuss possible responses to the vulnerability separately from the proposed fix that you have provided in this PR. I invite anyone interested to participate in that discussion!

sjakobi · 2021-09-14T13:16:57Z

@sjakobi The "internal discussion" seems to have gotten lost in the shuffle a year back. A timely fix would be good at this point. Can we merge?

Unfortunately we (at least @tibbe) are not convinced that this PR is of much help with preventing collision attacks. Citing #319 (comment):

@tibbe's assessment was that it would still require a strong hash function and a random seed to be reasonably secure. With many weaker hash functions, it is possible to generate seed-independent collisions, see e.g. this blog post. By this assessment, #217 "adds" little security of its own.

This is no final judgment on this PR, but I don't currently expect that this PR will be part of a "timely fix" in u-c, nor do I currently know of a way to provide a quick fix within u-c itself at all.

If you're looking for potential mitigating measures, see #319 (comment) and #319 (comment).

treeowl · 2021-09-14T13:20:59Z

I agree; this is a fair bit of extra complexity and maintenance burden for something we don't actually know for sure is helpful.

phadej · 2021-10-02T12:43:15Z

I agree; this is a fair bit of extra complexity and maintenance burden for something we don't actually know for sure is helpful.

Hopefully this a helpful argument: https://medium.com/@robertgrosse/generating-64-bit-hash-collisions-to-dos-python-5b21404a5306 even Robert (the author of that blog post) doesn't share the universal collision sets (i.e. they collide for all salts), he is very far in showing they are feasible to construct. FNV-1 is just a weak hash. Thus having hashmap with another hashmap for collision resolution is potentially very bad, as that chain simply will not terminate: It's enough to have just two keys which collide universally! Collision resolution method should be vastly different then reusing (weak) hash function.

NorfairKing · 2021-10-02T15:02:23Z

Alright, that convinced me. Closing this for good.

sjakobi · 2021-10-03T13:36:03Z

Thus having hashmap with another hashmap for collision resolution is potentially very bad, as that chain simply will not terminate: It's enough to have just two keys which collide universally!

That's a very good insight, @phadej. I hadn't realized this earlier.

Tom Sydney Kerckhove added 30 commits August 14, 2018 12:48

Intermediary commit

63b8861

Halfway through

c96f41e

Until unionWithKey

561214d

Until mapWithKey

69a5f95

until filterMapAux

aaf7dda

Finished a module

6098819

Something that compiles

209ee6b

Fixed double collisions

f5474dd

fixed some bugs

3db53fc

At least now the test completes

feeb8c4

Fixed the unions

e725b1f

Made strict hashmap tests pass

e687bba

Now without storing the salt anywhere

789dd5e

Merge branch 'dont-store-salts' into collisionless-containers

d562d8d

Putting back the show instance

5c6aa0f

Fixed the filter function as well, to not retain too many collisions

a1b5427

Fixed compilation warnings

9eb436e

woops

8edc380

Deleted a rogue comment

29d9f90

little comment

42dc26f

different hash

b46bbb0

Something that compiles

b883ecb

added back the always collision test, it does not loop infinitely now

992ca50

Fixed the strictness properties

dcec9f1

Got deletions working, sortof

c348ab5

Got adjust working, sort-of

07ed2f5

fixed deleteKeyExists, sort-of

3dda7e1

Fixed the lazy union tests

d4e09fa

Fixed the filter functions

65b6783

Simplified unconsing hashmaps

9c2deeb

treeowl requested changes Oct 30, 2018

View reviewed changes

Tom Sydney Kerckhove added 2 commits October 31, 2018 09:10

put back the show instance

6838fff

some docs

58af451

vaibhavsagar reviewed Nov 26, 2018

View reviewed changes

sjakobi added the performance label May 31, 2020

sjakobi mentioned this pull request Jun 2, 2020

Unordered-containers sometimes much, much slower than hashmap due to collisions. #121

Closed

NorfairKing mentioned this pull request Sep 11, 2021

Possible DOS due to hash collision haskell/aeson#864

Closed

sjakobi mentioned this pull request Sep 13, 2021

Vulnerability to collision attacks #319

Open

NorfairKing closed this Oct 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Collisionless containers (DO NOT MERGE) #217

WIP: Collisionless containers (DO NOT MERGE) #217

NorfairKing commented Oct 15, 2018

treeowl Oct 30, 2018

vaibhavsagar Nov 26, 2018

sjakobi commented Jun 19, 2020

sjakobi commented Jun 19, 2020

NorfairKing commented Jun 19, 2020 •

edited

Loading

sjakobi commented Jun 20, 2020

NorfairKing commented Jun 20, 2020

sjakobi commented Jun 20, 2020

NorfairKing commented Jun 20, 2020

TomMD commented Sep 11, 2021

treeowl commented Sep 11, 2021 via email

TomMD commented Sep 11, 2021

treeowl commented Sep 11, 2021 via email

NorfairKing commented Sep 11, 2021

Boarders commented Sep 11, 2021

NorfairKing commented Sep 12, 2021 •

edited

Loading

Bodigrim commented Sep 12, 2021

sjakobi commented Sep 13, 2021

sjakobi commented Sep 14, 2021

treeowl commented Sep 14, 2021

phadej commented Oct 2, 2021 •

edited

Loading

NorfairKing commented Oct 2, 2021

sjakobi commented Oct 3, 2021 •

edited

Loading

WIP: Collisionless containers (DO NOT MERGE) #217

WIP: Collisionless containers (DO NOT MERGE) #217

Conversation

NorfairKing commented Oct 15, 2018

treeowl Oct 30, 2018

Choose a reason for hiding this comment

vaibhavsagar Nov 26, 2018

Choose a reason for hiding this comment

sjakobi commented Jun 19, 2020

sjakobi commented Jun 19, 2020

NorfairKing commented Jun 19, 2020 • edited Loading

sjakobi commented Jun 20, 2020

NorfairKing commented Jun 20, 2020

sjakobi commented Jun 20, 2020

NorfairKing commented Jun 20, 2020

TomMD commented Sep 11, 2021

treeowl commented Sep 11, 2021 via email

TomMD commented Sep 11, 2021

treeowl commented Sep 11, 2021 via email

NorfairKing commented Sep 11, 2021

Boarders commented Sep 11, 2021

NorfairKing commented Sep 12, 2021 • edited Loading

Bodigrim commented Sep 12, 2021

sjakobi commented Sep 13, 2021

sjakobi commented Sep 14, 2021

treeowl commented Sep 14, 2021

phadej commented Oct 2, 2021 • edited Loading

NorfairKing commented Oct 2, 2021

sjakobi commented Oct 3, 2021 • edited Loading

NorfairKing commented Jun 19, 2020 •

edited

Loading

NorfairKing commented Sep 12, 2021 •

edited

Loading

phadej commented Oct 2, 2021 •

edited

Loading

sjakobi commented Oct 3, 2021 •

edited

Loading